Policy Networks with Two-Stage Training for Dialogue Systems

نویسندگان

  • Mehdi Fatemi
  • Layla El Asri
  • Hannes Schulz
  • Jing He
  • Kaheer Suleman
چکیده

In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actorcritic deep learner is considerably bootstrapped from a combination of supervised and batch RL. In addition, convergence to an optimal policy is significantly sped up compared to other deep RL methods initialized on the data with batch RL. All experiments are performed on a restaurant domain derived from the Dialogue State Tracking Challenge 2 (DSTC2) dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Promotion of Efficiency of Township Health Care Network\'s Administrative, Financial and Supplies System; by Providing Administrative and Financial Facilities

Today, decrease of bureaucrasy in administrative system is considered as a main factor for efficiency in these systems. This policy has been offered by the Ministry of Health and Medical Training (Education) for applying in the all health care networks of Iran. According to this policy, health center of Chaharmahal and Bakhtiary has executed (carried out) a project in Boroujen health care, sin...

متن کامل

Optimization of two-stage production/inventory systems under order base stock policy with advance demand information

It is important to share demand information among the members in supply chains. In recent years, production and inventory systems with advance demand information (ADI) have been discussed, where advance demand information means the information of demand which the decision maker obtains before the corresponding actual demand arrives. Appropriate production and inventory control using demand info...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

Two-Stage Stochastic Day-Ahead Market Clearing in Gas and Power Networks Integrated with Wind Energy

The significant penetration rate of wind turbines in power systems made some challenges in the operation of the systems such as large-scale power fluctuations induced by wind farms. Gas-fired plants with fast starting ability and high ramping can better handle natural uncertainties of wind power compared to other traditional plants. Therefore, the integration of electrical and natural gas syste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016